Identification of SLC40A1, LCN2, CREB5, and SLC7A11 as ferroptosis-related biomarkers in alopecia areata through machine learning

Alopecia areata (AA) is a common non-scarring hair loss condition driven by the collapse of immune privilege and oxidative stress. The role of ferroptosis, a type of cell death linked to oxidative stress, in AA is yet to be explored, even though it's implicated in various diseases. Using transcriptome data from AA patients and controls from datasets GSE68801 and GSE80342, we aimed to identify AA diagnostic marker genes linked to ferroptosis. We employed Single-sample gene set enrichment analysis (ssGSEA) for immune cell infiltration evaluation. Correlations between ferroptosis-related differentially expressed genes (FRDEGs) and immune cells/functions were identified using Spearman analysis. Feature selection was done through Support vector machine-recursive feature elimination (SVM-RFE) and LASSO regression models. Validation was performed using the GSE80342 dataset, followed by hierarchical internal validation. We also constructed a nomogram to assess the predictive ability of FRDEGs in AA. Furthermore, the expression and distribution of these molecules were confirmed through immunofluorescence. Four genes, namely SLC40A1, LCN2, CREB5, and SLC7A11, were identified as markers for AA. A prediction model based on these genes showed high accuracy (AUC = 0.9052). Immunofluorescence revealed reduced expression of these molecules in AA patients compared to normal controls (NC), with SLC40A1 and CREB5 showing significant differences. Notably, they were primarily localized to the outer root sheath and in proximity to the sebaceous glands. Our study identified several ferroptosis-related genes associated with AA. These findings, emerging from the integration of immune cell infiltration analysis and machine learning, contribute to the evolving understanding of diagnostic and therapeutic strategies in AA. Importantly, this research lays a solid foundation for subsequent studies exploring the intricate relationship between AA and ferroptosis.


Materials and methods
The study's flowchart is illustrated in Fig. 1.

Data acquisition and processing
Two AA datasets (GSE68801 and GSE80342) were obtained from the GEO database (https:// www.ncbi.nlm.nih.gov/ geo/).The GSE68801 dataset included 122 samples (60 AA lesions and 62 normal controls (NC)), while GSE80342 contained 15 samples (12 AA lesions and 3 NC).Expression matrix identifiers were converted from 'probe id' to 'symbol' using platform annotation files.Whenever multiple probes matched a single gene symbol, their mean value was computed for expression levels.To correct batch effects, we employed the combat function from the R "SVA" package.While GSE68801 was the main dataset for analysis, GSE80342 was reserved for external validation.

Identification of candidate marker genes
We applied two machine learning techniques to discern potential AA marker genes.Firstly, using the "glmnet" R package, we conducted LASSO regression analysis with tenfold cross-validation, emphasizing genes crucial for distinguishing AA from NC 25 .Secondly, the Support Vector Machine (SVM), a supervised machine learning model, was employed for analyses.To mitigate overfitting risks, recursive feature elimination (RFE) optimized the gene set selection, culminating in SVM-RFE.

Nomogram construction
Utilizing the R "rms" package, we devised a nomogram by integrating marker genes.Calibration curves assessed its accuracy.www.nature.com/scientificreports/

Immunofluorescence
Scalp tissues from both AA patients and healthy controls were fixed overnight at 4 °C in 4% PFA, subsequently embedded in paraffin, and sectioned.For the preparation of hair follicle sections, we consistently adopted an oblique sectioning angle of 45 degrees relative to the plane of the scalp.This approach was employed to maximize the visibility of hair follicles, allowing for an enhanced assessment of both the follicles and their surrounding as well as intrinsic parameters.Sections were blocked with bovine serum albumin and incubated overnight at 4 °C with antibodies against SLC40A1 (26601-1-AP, Proteintech), LCN2 (26991-1-AP, Proteintech), CREB5 (14196-1-AP, Proteintech), and SLC7A11 (26864-1-AP, Proteintech).After primary antibody exposure, the samples were treated with Alexa Fluor 488-conjugated anti-rabbit IgG (Invitrogen) for 1 h at room temperature, followed by nuclear counterstaining with 4' ,6-diamidino-2-phenylindole (DAPI; Beyotime).Specimens were visualized and captured using a fluorescence microscope (Olympus).
To quantify the relative fluorescence area (%), the captured images were analyzed using ImageJ software (NIH).The total area of fluorescence was measured for each marker, and this value was expressed as a percentage of the total area of the tissue section under observation.This approach allowed for the standardization of fluorescence intensity measurements across different samples.
To ensure the integrity and stability of the fluorescent signals, experimental procedures involving the handling and processing of fluorescently labeled samples were meticulously conducted in a darkroom setting.Prior to tissue collection, informed consent was obtained from all participants.This study was approved by the Medical Ethics Committee of the Third People's Hospital of Hangzhou (Approval No.: 2022KA058).Both patients and healthy controls were aged 40-60 years.Patients exhibited hair loss covering 40-60% of the scalp area, and none had undergone any treatments in the preceding 6 months.

Ethics approval and consent to participate
The study was conducted in accordance with the ethical standards of the Declaration of Helsinki and its later amendments.It was approved by the Medical Ethics Committee of the Third People's Hospital of Hangzhou (Approval No.: 2022KA058).Informed consent was obtained from all individual participants included in the study.All methods were carried out in accordance with relevant guidelines and regulations.

Statistical analysis
In our differential gene analysis, we applied False Discovery Rate (FDR) correction to adjust p-values for multiple testing.This ensures the reported adjusted p-values minimize the likelihood of false positives, enhancing the reliability of our identified DEGs.For external validation in the GSE80342 dataset, we used Student's t-tests to compare gene expression between AA and NC.Spearman correlation analysis determined correlation coefficients.The R "pROC" package rendered ROC curves and AUC values, gauging the classification efficacy of our predictive model for AA versus NC.All statistical computations were conducted using R software 4.1.1 and Prism (Graph-Pad Prism, USA).Statistical significance was denoted as *p < 0.05, **p < 0.01, ***p < 0.001, or ****p < 0.0001.

Identification of DEGs and FRDEGs
A volcano plot (Fig. 2a) revealed 633 DEGs between AA and NC in GSE68801, with 335 down-regulated (green dots) and 298 up-regulated (red dots), as detailed in Tab.s3.A Venn diagram (Fig. 2b) comparing FRGs to DEGs in GSE68801 identified 6 overlapping FRDEGs for further analysis, with 4 down-regulated and 2 up-regulated.A hierarchically clustered heatmap (Fig. 2c) depicted the matrix of FRDEGs in GSE68801.

Immune cell infiltration analysis in AA
We applied the ssGSEA algorithm for a comparative analysis of immune cell infiltration in AA lesions versus NC scalp.The analysis incorporated 16 types of imc and 13 types of imf across 60 AA samples, as shown in Fig. 3a.Correlations between imc and imf proportions were further explored in Fig. 3b,c.Significantly, Fig. 3d,e reveal a pronounced increase in various imc and imf in AA compared to NC. Figure 3d illustrates marked increments in specific imc subsets, particularly CD8 + T cells, in AA lesions, excluding T follicular helper (Tfh) and regulatory T cells (Treg).Figure 3e highlights enhanced immune functions in AA, with notable upregulation in cytolytic activity and inflammation promotion.These findings underscore the extensive inflammatory infiltration characterizing AA.

Analysis of FRDEGs combined with expression immune cell infiltration
For correlation analysis, we assembled data from six FRDEGs, 16 imcs, and 13 imfs associated with immune cell infiltration and generated a corresponding heatmap.A strong correlation between these genes and immune cell infiltration was observed.Among them, ALOX12B and ALOX15 were positively correlated with imc and imf, while SLC40A1, LCN2, CREB5, and SLC7A11 were predominantly negatively correlated (Fig. 4).

Hierarchical and external validation of marker genes
For assessing the diagnostic efficacy of the five marker genes, the AA samples from GSE68801 were categorized into two subgroups (SAA and NSAA) based on clinical parameters.The SAA group contained 32 cases, and the NSAA group had 28 cases.Only SLC40A1, LCN2, CREB5, and SLC7A11 showed significantly lower expression in the SAA group, while ALOX15 exhibited no significant difference between groups (Fig. 6a).To further validate the marker genes' diagnostic efficacy, we extracted gene expression data from GSE80342 (12 AA lesions and 3 NC scalps) and generated a violin plot (Fig. 6b).The expression of these four genes was significantly different between AA and NC groups, with SLC40A1, LCN2, CREB5, and SLC7A11 displaying significantly higher expression in AA, consistent with GSE68801 datasets.

Development of a nomogram for predicting AA risk.
To determine the predictive potential of the three identified marker genes for AA risk, we established a multivariate logistic model (Fig. 7a).In the nomogram, each marker gene was assigned a score, and the total score was calculated by summing the individual scores.The total score correlated with varying AA risks.The calibration curve (Fig. 7b) demonstrated the nomogram's accuracy; a bias-corrected line closer to the ideal line indicated higher prediction accuracy for AA.Our model exhibited high accuracy.To highlight the predictive capacity of these four marker genes, we integrated them into a receiver operating characteristic (ROC) curve (Fig. 7c), which yielded an area under the curve (AUC) of 0.9052.This value signified that the four marker genes provided strong predictive performance for AA.

Immunofluorescence
We evaluated the expression of the respective genes in scalp tissues from 3 AA patients and 3 NC.As depicted in (Fig. 8), immunofluorescence staining confirmed reduced expression of SLC40A1, CREB5, LCN2, and SLC7A11 in the scalp tissues of AA patients compared to healthy controls, with SLC40A1 and CREB5 showing statistically significant differences.Furthermore, LCN2, CREB5, and SLC7A11 were predominantly localized to the outer root sheath, whereas SLC40A1 was primarily observed in the sebaceous gland region.

Discussion
In our investigation, utilizing advanced machine learning methodologies, we pinpointed SLC40A1, LCN2, CREB5, and SLC7A11 as prospective diagnostic biomarkers for AA.Distinctly, our findings bridge, for the first time, the association between ferroptosis and the pathogenesis of AA, laying foundational groundwork for further exploration into the role of ferroptosis in this condition.
AA affects approximately 2% of the global population 8 , with profound impacts on quality of life and potential psychological ramifications 26 .Its links to cardiovascular disease underline the necessity for early diagnosis.Despite some molecular indications suggesting ferroptosis as a pathophysiological factor, the connection remained largely speculative until our study.
Excess iron is fundamental to the process of ferroptosis.Fe2 + catalyzes the formation of reactive oxygen species (ROS) through the Fenton reaction, leading to lipid peroxidation and the initiation of ferroptosis 27 .Iron contributes to the generation of highly reactive and toxic hydroxyl radicals, thereby stimulating oxidative damage.Recent literature suggests that aberrations in iron metabolism may play a crucial role in skin aging 28 .Studies employing iron chelators to prevent ROS generation have shown promise in reducing photodamage and aging-related oxidative stress 29 .Research on ferroptosis in skin diseases has been limited, mostly focusing on autoimmune disorders such as systemic lupus erythematosus (SLE) and vitiligo.In these conditions, oxidative stress plays a dominant role, leading to the ferroptosis of neutrophils and melanocytes 30 .However, research on the role of ferroptosis in AA, particularly its specific pathological contributions and potential therapeutic implications, remains relatively scarce.
Recent studies hint at a link between ferroptosis and immune cell activity, exemplified by the role of CD8 + T cells in intensifying ferroptosis in tumor cells 31 .This connection is particularly pertinent to AA, where both ferroptosis and immune dysfunction appear intertwined.Our analysis delves into this interplay, revealing new aspects of AA's immunopathology.Our data align with recent single-cell sequencing results 32 , indicating that CD8 + T cells, marked by heightened cytolytic activity and inflammation, are the significant drivers of the condition.
While previous studies primarily utilized bioinformatics to discern potential AA marker genes-such as Yuan X et al. pinpointing EOMEs as a potential therapeutic target for AA 33 , Zhang T et al. highlighting four key genes (CD28, HOXC13, KRTAP1-3, and GPRC5D) for AA treatment 34 , and Zhang Z et al. identifying three salient genes (BMP2, KRTs, and KRTAPs) in relation to AA 35 -our research stands distinct.
By synergistically integrating the GEO dataset, the FerrDb database, and advanced machine learning techniques, we have delineated five potential marker genes specifically associated with both ferroptosis and AA, underscoring a novel dimension to the molecular understanding of the disease.Notably, four of these genes (SLC40A1, LCN2, CREB5, and SLC7A11) were found to be downregulated in AA lesions compared to NC scalps.Their expression levels inversely correlated with AA severity.These findings are further supported by the high accuracy of our ROC curves, suggesting their potential as AA biomarkers.Recent studies have highlighted the role of SLC40A1, a protein localized to the cellular membrane, with a putative function in iron export from duodenal epithelial cells 36 .This discovery broadens our understanding of iron metabolism, a key factor in ferroptosis.Another gene, LCN2, is known for encoding a protein that is primarily involved in lipid transport and innate immunity.It may also play a role in maintaining skin homeostasis and defending against pathogenic invasion 37 .Interestingly, therapies targeting LCN2 have been shown to induce ferroptosis in tumor cells, offering new avenues for hepatocarcinoma treatment 37 .The CREB5 gene is implicated in the PI3K-Akt signaling pathway 38 .Its protein product, predominantly expressed in articular cartilage, acts as a transcriptional factor to enhance Prg4 expression.This activation is crucial for the TGF-β and EGFR pathways, both important in hair follicle morphogenesis 39 .The role of CREB5 in the pathogenesis of AA, however, remains to be elucidated.Additionally, SLC7A11, a relative of SLC40A1, is involved in iron efflux and exosome-mediated ferritin output, which may inhibit ferroptosis.Its role as a potential marker in ferroptosis pathways, specifically in the context of AA, merits further investigation 40 .While research on SLC7A11 is predominantly focused on oncology, its relevance to AA is an emerging area of interest, especially considering its function in amino acid transport, which may prevent lipid peroxidation and ferroptosis 41 .
Immunofluorescence demonstrated diminished expression of SLC40A1, CREB5, LCN2, and SLC7A11 in the scalp tissues of AA patients compared to NC, with SLC40A1 and CREB5 showing statistically significant differences.The absence of statistical differences for some molecules might be attributed to the limited number of cases selected for immunofluorescence.Nevertheless, a decline in the expression of these molecules in AA was evident.Notably, these molecules were primarily localized in the outer root sheath and sebaceous gland regions, areas known for their metabolic activity.This suggests a potential link between AA onset and metabolic perturbations.observed lower levels of LCN2 and SLC7A11 in AA, given the limited statistical power of our study, may not fully capture their role in the disease.These results, therefore, should be viewed as indicative rather than definitive, pending further investigation with larger cohorts and functional validation.
In conclusion, our findings elucidate the potential role of FRGs in AA's progression.This not only enriches our understanding of AA's molecular underpinnings but also points toward promising diagnostic and therapeutic avenues.Our results beckon more exhaustive studies, aiming for personalized treatment solutions for AA patients.

Figure 3 .
Figure 3.Immune cell infiltration Analysis in Alopecia Areata (AA).(a) Heatmap shows 16 immune cell types and 13 immune functions identified using the ssGSEA algorithm.The heatmap was generated using R software (version 4.1.1,https:// www.r-proje ct.org/).Adjustments to font type and size were made for enhanced clarity.(b) Correlation matrix displays the proportions of different immune cell types.(c) Correlation matrix presents the proportions of various immune functions.(d) Immune Cell Profile in AA and Normal Scalp Tissue: The box plot compares 16 immune cell (imc) populations between Alopecia Areata-affected scalp (AA, red boxes) and normal scalp from healthy controls and unaffected AA scalp (Normal, green boxes).Significant increases in imc subsets, notably CD8 + T cells, are observed in AA lesions, except for T follicular helper (Tfh) and regulatory T cells (Treg).Significance is marked by *p < 0.05, **p < 0.01, ***p < 0.001, with 'ns' for non-significant.The box encompasses the interquartile range with the median indicated by the horizontal line, and outliers as points.(e) Immune Function Enhancement in AA Scalp: The box plot shows a marked elevation of 13 immune function (imf) parameters in AA compared to normal scalp tissue.Enhancements in cytolytic activity and inflammation promotion are the most pronounced, highlighting their potential role in AA pathophysiology.Statistical significance is denoted by *p < 0.05, **p < 0.01, ***p < 0.001.The plot's box represents the interquartile range, the median is the line within, and points are outliers.AA, alopecia areata, imc, immune cell; imf, immune function; ssGSEA, single-sample gene set enrichment analysis, Tfh, T follicular helper cells; Treg, regulatory T cells.

Figure 4 .
Figure 4. Correlation Analysis of FRDEGs with Immune Cells (imc) and Immune Functions (imf) in Alopecia Areata (AA).This figure illustrates the correlation between ferroptosis-related differentially expressed genes (FRDEGs) and immune cell infiltration.Notably, ALOX12B and ALOX15 demonstrated a positive correlation with both imc and imf.In contrast, SLC40A1, LCN2, CREB5, and SLC7A11 showed a predominantly negative correlation with these parameters.AA, alopecia areata; imc, immune cells; imf, immune functions.

Figure 5 .
Figure 5. Diagnostic Marker Gene Identification for Alopecia Areata (AA).(a) Displays ten-fold crossvalidation in the LASSO model, with each curve representing an individual gene.(b) Shows feature selection in the LASSO model, highlighting optimal lambda values with vertical dashed lines.(c) Depicts the process of AA marker gene identification using the SVM-RFE algorithm.(d) Venn diagram demonstrates the overlap of genes identified by both SVM-RFE and LASSO methods.Five overlapping genes-ALOX15, SLC40A1, LCN2, CREB5, and SLC7A11-were identified as potential diagnostic markers for AA.AA, alopecia areata, SVM-RFE, support vector machine-recursive feature elimination; LASSO, least absolute shrinkage and selection operator;)

Figure 6 .
Figure 6.Validation of Marker Genes for Alopecia Areata (AA).(a) Hierarchical validation in the GSE68801 dataset, with subgroups SAA (32 cases) and NSAA (28 cases) based on clinical parameters.SLC40A1, LCN2, CREB5, and SLC7A11 showed significantly lower expression in the SAA group.(b) External validation using the GSE80342 dataset (12 AA lesions, 3 NC scalps) revealed significant differences in gene expression between AA and NC groups, particularly higher expression of SLC40A1, LCN2, CREB5, and SLC7A11 in AA.AA, alopecia areata patients; NC, normal controls.